99 research outputs found

    Involuntary Embarrassing Exposures in Online Social Networks: A Replication Study

    Get PDF
    In this study, we carry out a methodological replication of the research done by Choi et al. (2015) published in Information System Research. In the original study, the authors integrate the privacy and teasing literatures under a social exchange framework to understand online involuntary exposures. The original study was conducted on students from Southeast Asia. Our study uses a significantly larger sample of college students in the United States. Our replication results show that whereas most of the hypotheses supported by the original results on behavioral responses replicate with high consistency (8 out of 12 hypotheses), the results that deal with the effects of network commonality on perceived privacy invasion and perceived relationship bonding did not replicate (4 out of 12 hypotheses). These results could stem from a failed manipulation of network commonality. We look into the possible rationales for this and show what would be an effective manipulation in our context. Further, we expand the original study by testing an additional embarrassing scenario catered to our subject pool. The results suggest that perceived privacy invasion and perceived relationship bonding affect individual’s behavioral responses to embarrassing exposures

    Automatic Identification of Online Predators in Chat Logs by Anomaly Detection and Deep Learning

    Get PDF
    Providing a safe environment for juveniles and children in online social networks is considered as a major factor in improving public safety. Due to the prevalence of the online conversations, mitigating the undesirable effects of juvenile abuse in cyberspace has become inevitable. Using automatic ways to address this kind of crime is challenging and demands efficient and scalable data mining techniques. The problem can be casted as a combination of textual preprocessing in data/text mining and binary classification in machine learning. This thesis proposes two machine learning approaches to deal with the following two issues in the domain of online predator identification: 1) The first problem is gathering a comprehensive set of negative training samples which is unrealistic due to the nature of the problem. This problem is addressed by applying an existing method for semi-supervised anomaly detection that allows the training process based on only one class label. The method was tested on two datasets; 2) The second issue is improving the performance of current binary classification methods in terms of classification accuracy and F1-score. In this regard, we have customized a deep learning approach called Convolutional Neural Network to be used in this domain. Using this approach, we show that the classification performance (F1-score) is improved by almost 1.7% compared to the classification method (Support Vector Machine). Two different datasets were used in the empirical experiments: PAN-2012 and SQ (Sûreté du Québec). The former is a large public dataset that has been used extensively in the literature and the latter is a small dataset collected from the Sûreté du Québec

    Sequential Gradient Coding For Straggler Mitigation

    Full text link
    In distributed computing, slower nodes (stragglers) usually become a bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient technique that uses principles of error-correcting codes to distribute gradient computation in the presence of stragglers. In this paper, we consider the distributed computation of a sequence of gradients {g(1),g(2),,g(J)}\{g(1),g(2),\ldots,g(J)\}, where processing of each gradient g(t)g(t) starts in round-tt and finishes by round-(t+T)(t+T). Here T0T\geq 0 denotes a delay parameter. For the GC scheme, coding is only across computing nodes and this results in a solution where T=0T=0. On the other hand, having T>0T>0 allows for designing schemes which exploit the temporal dimension as well. In this work, we propose two schemes that demonstrate improved performance compared to GC. Our first scheme combines GC with selective repetition of previously unfinished tasks and achieves improved straggler mitigation. In our second scheme, which constitutes our main contribution, we apply GC to a subset of the tasks and repetition for the remainder of the tasks. We then multiplex these two classes of tasks across workers and rounds in an adaptive manner, based on past straggler patterns. Using theoretical analysis, we demonstrate that our second scheme achieves significant reduction in the computational load. In our experiments, we study a practical setting of concurrently training multiple neural networks over an AWS Lambda cluster involving 256 worker nodes, where our framework naturally applies. We demonstrate that the latter scheme can yield a 16\% improvement in runtime over the baseline GC scheme, in the presence of naturally occurring, non-simulated stragglers

    Evaluation of quality of life in patients with diabetes mellitus, based on its complications, referred to Emam Hossein Hospital, Shahroud

    Get PDF
    زمینه و هدف: بیماری دیابت ملیتوس و عوارض ناشی از آن به عنوان یک مشکل بهداشتی عمده و هفتمین علت مرگ در ایالات متحده محسوب می گردد و همانند سایر بیماریهای مزمن، افزون بر مرگ و میر بالا، گرفتاریهای فردی، خانوادگی، اجتماعی و مالی بسیاری به همراه دارد. پژوهش حاضر به منظور بررسی کیفیت زندگی بیماران دیابتی بر اساس عوارض دیابت انجام شده است. روش بررسی: در این پژوهش که یک مطالعه توصیفی - تحلیلی است تعداد 150 بیمار مبتلا به بیماری دیابت نوع II مراجعه کننده به بیمارستان امام حسین (ع) شاهرود بر اساس مشخصات مورد نظر پژوهشگر و به صورت غیر تصادفی انتخاب گردیدند و بر اساس نوع عوارض ناشی از بیماری دیابت در 4 گروه (بدون عارضه، دارای عوارض میکروواسکولر، دارای عوارض ماکروواسکولر، دارای هم عوارض میکرو و هم ماکروواسکولر) تقسیم بندی شدند. انتخاب نمونه بر اساس فاکتورهای: سن، جنس، میزان سواد و غیره صورت گرفت. کیفیت زندگی بیماران مبتلا به بیماری دیابت نوع II در ابعاد عملکرد فیزیکی، عملکرد روحی-روانی و اجتماعی بر اساس پرسشنامه استاندارد ADDQoL (Audit of Diabetes Dependent Quality of Life) سنجیده شد و با استفاده از آزمونهای آمار توصیفی و استنباطی (تی مستقل، آنالیز واریانس یکطرفه و ضریب همبستگی پیرسون توسط نرم افزار SPSSمورد تجزیه و تحلیل قرار گرفت. یافته ها: بر اساس نتایج حاصل از پژوهش، میانگین سنی واحدهای مورد پژوهش 59 سال بود. بیشترین درصد واحدهای پژوهش 3/79 متأهل و 7/62 بی سواد بودند و 3/81 آنان دارای درآمد متوسط ماهیانه کمتر از 100 هزار تومان بودند. بیشترین درصد واحدهای پژوهش 7/78 اظهار داشتند که هیچ دوره آموزش خاصی در ارتباط با بیماری دیابت نگذرانیده اند. میانگین نمره کیفیت زندگی کل در گروه بدون عوارض 6/11±4/60، در گروه با عوارض میکروواسکولر 4/10±4/56، در گروه با عوارض ماکروواسکولر 8/8±61 و در گروه با عوارض میکرو- ماکروواسکولر 7/11±1/50 محاسبه گردید. آزمونهای آماری رابطه معنی داری بین میانگین نمره کیفیت زندگی واحدهای مورد پژوهش و متغیرهای دموگرافیک آنان نشان ندادند. نتیجه گیری: نتایج پژوهش حاضر نشان داد که عوارض دیابت اثرات معکوس و معنی داری بر تمامی ابعاد کیفیت زندگی بیماران دارد. لذا تشخیص سریع دیابت و عوارض دراز مدت ناشی از این بیماری و بکارگیری استراتژی های درمانی و مراقبتی مناسب در جهت رفع یا کاهش این عوارض یک ضرورت اساسی محسوب می گردد

    Efficient Secure Aggregation for Privacy-Preserving Federated Machine Learning

    Full text link
    Federated learning introduces a novel approach to training machine learning (ML) models on distributed data while preserving user's data privacy. This is done by distributing the model to clients to perform training on their local data and computing the final model at a central server. To prevent any data leakage from the local model updates, various works with focus on secure aggregation for privacy preserving federated learning have been proposed. Despite their merits, most of the existing protocols still incur high communication and computation overhead on the participating entities and might not be optimized to efficiently handle the large update vectors for ML models. In this paper, we present E-seaML, a novel secure aggregation protocol with high communication and computation efficiency. E-seaML only requires one round of communication in the aggregation phase and it is up to 318x and 1224x faster for the user and the server (respectively) as compared to its most efficient counterpart. E-seaML also allows for efficiently verifying the integrity of the final model by allowing the aggregation server to generate a proof of honest aggregation for the participating users. This high efficiency and versatility is achieved by extending (and weakening) the assumption of the existing works on the set of honest parties (i.e., users) to a set of assisting nodes. Therefore, we assume a set of assisting nodes which assist the aggregation server in the aggregation process. We also discuss, given the minimal computation and communication overhead on the assisting nodes, how one could assume a set of rotating users to as assisting nodes in each iteration. We provide the open-sourced implementation of E-seaML for public verifiability and testing

    Automated Deductive Content Analysis of Text: A Deep Contrastive and Active Learning Based Approach

    Get PDF
    Content analysis traditionally involves human coders manually combing through text documents to search for relevant concepts and categories. However, this approach is time-intensive and not scalable, particularly for secondary data like social media content, news articles, or corporate reports. To address this problem, the paper presents an automated framework called Automated Deductive Content Analysis of Text (ADCAT) that uses deep learning-based semantic techniques, ontology of validated construct measures, large language model, human-in-the-loop disambiguation, and a novel augmentation-based weighted contrastive learning approach for improved language representations, to build a scalable approach for deductive content analysis. We demonstrate the effectiveness of the proposed approach to identify firm innovation strategies from their 10-K reports to obtain inferences reasonably close to human coding

    Multi-view Representation Learning from Malware to Defend Against Adversarial Variants

    Full text link
    Deep learning-based adversarial malware detectors have yielded promising results in detecting never-before-seen malware executables without relying on expensive dynamic behavior analysis and sandbox. Despite their abilities, these detectors have been shown to be vulnerable to adversarial malware variants - meticulously modified, functionality-preserving versions of original malware executables generated by machine learning. Due to the nature of these adversarial modifications, these adversarial methods often use a \textit{single view} of malware executables (i.e., the binary/hexadecimal view) to generate adversarial malware variants. This provides an opportunity for the defenders (i.e., malware detectors) to detect the adversarial variants by utilizing more than one view of a malware file (e.g., source code view in addition to the binary view). The rationale behind this idea is that while the adversary focuses on the binary view, certain characteristics of the malware file in the source code view remain untouched which leads to the detection of the adversarial malware variants. To capitalize on this opportunity, we propose Adversarially Robust Multiview Malware Defense (ARMD), a novel multi-view learning framework to improve the robustness of DL-based malware detectors against adversarial variants. Our experiments on three renowned open-source deep learning-based malware detectors across six common malware categories show that ARMD is able to improve the adversarial robustness by up to seven times on these malware detectors

    Two-Layer Coded Channel Access With Collision Resolution: Design and Analysis

    Get PDF
    We propose a two-layer coding architecture for communication of multiple users over a shared slotted medium enabling joint collision resolution and decoding. Each user first encodes its information bits with an outer code for reliability, and then transmits these coded bits with possible repetitions over transmission time slots of the access channel. The transmission patterns are dictated by the inner collision-resolution code and collisions with other users’ transmissions may occur. We analyze two types of codes for the outer layer: long-blocklength LDPC codes, and short-blocklength algebraic codes. With LDPC codes, a density evolution analysis enables joint optimization of both outer and inner code parameters for maximum throughput. With algebraic codes, we invoke a similar analysis by approximating their average erasure correcting capability while assuming a large number of active transmitters. The proposed low-complexity schemes operate at a significantly smaller gap to capacity than the state of the art. Our schemes apply both to a multiple access scenario where the number of users within a frame is known a priori, and to a random access scenario where that number is known only to the decoder. In the latter case, we optimize an outage probability due to the variability in user activity

    Comparison of coupled DEM-CFD and SPH-DEM methods in single and multiple particle sedimentation test cases

    Get PDF
    In this paper, the capability of two major methods for modelling two-phase flow systems, coupled discrete element method and computational fluid dynamics (DEM-CFD) and smoothed particle hydrodynamics and discrete element method (SPH-DEM), is investigated. The particle phase is modelled using the discrete element method DEM, while the fluid phase is described using either a mesh-based (CFD) or a mesh-less (SPH) method. Comparisons are performed to address algorithmic differences between these methods using a series of verification test cases, prior to its application to more complex systems. The present study describes a comprehensive verification for the fluid-particle simulations with -two different test cases: single particle sedimentation and sedimentation of a constant porosity block. In each case the simulation results are compared with the corresponding analytical solutions showing a good agreement in each case
    corecore